Fix issue 370 #414

h33p · 2024-07-02T17:12:22Z

Closes #370

The issue gets triggered, because qemu/virsh generated elf dump contains strtab at the end of the dump file. Meanwhile, dumps are large, and thus, it is infeasible to provide the entire dump file to the parser. This PR relaxes the requirements of the parser and returns default strtab, if the initial size check fails.

m4b

Hmm, I'm not sure if this is the best approach to fixing this issue.
In general, if the file is huge, I'd expect it to be memmap'd and lazy paged in as user iterates/accesses parts. This should be sufficient to not crash parsing a huge coredump file.
Secondly, unless I've misunderstood, if the issue is that the strtab is out of bounds, this means that the file is malformed, unless only a chunk of the file is being loaded and passed to goblin; but goblin's parse was not really ever designed for that usecase.
In that case, probably the user should use lazy_parse , since strtab will get defaulted there.
Lastly, I'm also not sure if this really fixes any issue; won't subsequent lookups all fail due to an empty strtab, and so this is really just kicking the can down the road?

h33p · 2024-07-16T20:27:22Z

@m4b thank you for your reply!

Secondly, unless I've misunderstood, if the issue is that the strtab is out of bounds, this means that the file is malformed, unless only a chunk of the file is being loaded and passed to goblin;

Exactly. just as in #370, only the head chunk is passed over to goblin. The file itself is not malformed.

In that case, probably the user should use lazy_parse , since strtab will get defaulted there.

This probably is the right way to go about this. My usecase is only reading the program headers, nothing more.

This means, I can just use ProgramHeader::parse, however, it requires a scroll Ctx. Is there a neat way to access/build that without copying the logic in parse_misc?

m4b · 2024-07-21T04:17:35Z

Ok, I see your issue. So, this is primarily because lazy_parse isn't really fleshed out in a consistent way. Iiuc, you would do:

let mut elf = Object::lazy_parse(bytes)?;
// how to populate program_headers?

So I think the best path forward here will be:

exposing the ctx by making the field pub or a pub fn ctx() (I don't want to do this)
adding some helper methods, especially for e.g., program_headers and section_headers, where it takes care of passing the various fiddly bits (i think this is ideal, but i'm not sure whether a builder-esque pattern, or mutating methods on the elf struct?

to flesh out 2. a bit, i mean something like:

pub fn parse_program_headers(&mut self, bytes: &[u8]) {
            let program_headers = ProgramHeader::parse(bytes, self.header.e_phoff as usize, self.header.e_phnum as usize, self.ctx)?;

            let mut interpreter = None;
            for ph in &program_headers {
                if ph.p_type == program_header::PT_INTERP && ph.p_filesz != 0 {
                    let count = (ph.p_filesz - 1) as usize;
                    let offset = ph.p_offset as usize;
                    interpreter = bytes.pread_with::<&str>(offset, ::scroll::ctx::StrCtx::Length(count)).ok();
                }
            }
      self.program_headers = program_headers;
      self.interpreter = interpreter;
}

Obviously this would forward to either private static methods that return the appropriate types, and so code duplication is reduced, or we use &mut self methods to construct/parse whichever components we want.

so there's a number of things I would change at this point, which is somewhat unrelated to your request to just parse the program headers:

We should probably introduce a builder or a &mut self api
lazy_parse should have taken some bytes and parsed the header, and defaulted the rest
parse calls lazy_parse, and then uses the above builder api to construct the remaining structs

I know this isn't probably what you wanted to hear, and while I understand your patch may "fix" the issue for you, it isn't unfortunately the right solution to the problem, which is specifically, you want to lazy_parse, and then supply just e.g., the program_header bytes and parse from those, to get the information you need.

Fortunately, I don't think this is a particularly hard problem to fix in a better manner, as I sort of outlined above it's just slightly more work :) let me know if you have any questions, and thanks for your patience and explaining your usecase better!

m4b reviewed Jul 15, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix issue 370 #414

Fix issue 370 #414

h33p commented Jul 2, 2024

m4b left a comment

h33p commented Jul 16, 2024

m4b commented Jul 21, 2024

Fix issue 370 #414

Are you sure you want to change the base?

Fix issue 370 #414

Conversation

h33p commented Jul 2, 2024

m4b left a comment

Choose a reason for hiding this comment

h33p commented Jul 16, 2024

m4b commented Jul 21, 2024